■MMb* 


P 0 Box  618 

State  College,  Pa.  16801 

Phone:  (814)  238-9621 


Applied  Research  in  Statistics  - Mathematics  - Operations  Research 


STATISTICAL  PROCEDURES  FOR  EXTRACTING 
OPTIMAL  PREDICTOR  VARIABLES  FOR  USE 
IN  AN  IMPACT  ACCELERATION  INJURY 
PREDICTION  MODEL 


Dennis  E.  Smith 
and 

John  J.  Peterson 


TECHNICAL  REPORT  NO.  112-2 


August  1979 


This  study  was  supported  by  the  Office  of  Naval  Research 
under  Contract  No.  N00014-79-C-0128,  Task  No.  NR  207-037 


Reproduction  in  whole  or  in  part  is  permitted 
for  any  purpose  of  the  United  States  Government 


Approved  for  public  release;  distribution  unlimited 


I 


I 

1 


I 


TABLE  OF  CONTENTS 


Page 


I.  INTRODUCTION  1 

II.  STATISTICAL  FORMULATION  4 

A.  DATA  PREPROCESSING 6 

B.  DATA  ANALYSIS 6 

III.  COMPUTATIONAL  PROCEDURE  11 

IV.  SUMMARY 16 

V.  REFERENCES 17 


11 


Accession  For 
liTTS  GRAfcl  ~ 
PDC  TAB 
Unannounced 
Justification. 


By 

nisr^lhotl^nZ. 


Avail  a-d /or 


Dlst 


special 


"T.  I 


I 


1. INTRODUCTION 


i 


f 

i 

■ 


,• 

; 

i 

i 

i 


Previous  Desmatics  technical  reports  [2,  3,  5]  investigated  the  use 
of  a logistic  function  in  the  development  of  impact  acceleration  injury  pre- 
diction models  based  on  empirical  data.  The  logistic  models  are  of  the  form 

k -1 

P(x)  - (1  + exp[-(0Q  + E0±xi) ] > 

where 

x - (x. x,  ) denotes  the  set  of  Independent  variables  considered, 

(3q» Si» . . . , denotes  a set  of  unknown  parameter  values, 
and  P(x)  denotes  the  true  probability  of  injury  corresponding  to  x. 
Another  report  [4]  described  construction  of  "injury"  (fatality) 
prediction  models  from  actual  -G^  accelerator  runs  using  subhuman  primates 
(Rhesus  monkeys)  with  restrained  torso  and  unrestrained  head.  The  data  was 
obtained  by  the  Naval  Aerospace  Medical  Research  Laboratory  (NAMRL)  Detachment 
as  part  of  its  research  effort  on  impact  acceleration  injury  prevention.  Two 
prediction  models  were  constructed  from  the  data,  each  based  on  a different 
set  of  Independent  variables. 

The  first  model  was  formed  using  three  variables  extracted  from  head 
dynamic  response  time  trace  data: 

(1)  peak  head  angular  acceleration  (resultant)  measured  in  radians/sec^, 
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(2)  peak  head  linear  acceleration  (resultant)  measured  in  meters/sec  , 
and  (3)  peak  head  angular  velocity  (resultant)  measured  in  radlans/sec. 

The  second  model  was  based  on  two  variables  describing  sled  acceleration: 

(1)  peak  sled  acceleration  measured  in  G's 
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and  (2)  rate  of  sled  acceleration  onset  measured  In  G/sec 
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Because  of  differing  Initial  head  positions  of  the  experimental 
subjects.  It  was  postulated  a priori  that  the  sled  acceleration  profile 
would  yield  less  sensitive  Independent  variables  than  head  dynamic 
responses  would.  Using  a common  data  base,  two  different  models  were 
constructed,  one  based  on  sled  profile  variables  and  the  other  based  on 
head  dynamic  response  variables.  Although  both  models  fit  reasonably  well, 
the  model  based  on  sled  profile  variables  resulted  In  a much  better  fit  [4]. 

It  Is  Intuitive  that  a model  based  on  head  dynamic  response  should 
provide  predictions  which  are  at  least  as  good  as  those  from  a model  based 
on  sled  profile.  Thus,  an  explanation  Is  required.  A reason  for  the 
anomalous  results  could  stem  from  inadequate  extraction  of  information  from 
head  dynamic  response  time  trace  data. 

To  determine  if,  in  fact,  this  is  the  case,  care  should  be  taken  to 
Insure  that  any  set  of  variables  describing  head  dynamic  response  comprises 
the  best  possible  set  of  injury  predictors.  For  the  particular  restraint 
configuration  used,  the  data  set  used  in  model  construction  has  shown  the 
sled  profile  variables  to  be  almost  perfect  predictors  of  injury  likelihood. 
Thus,  it  is  reasonable  to  extract,  from  the  head  dynamic  response  data, 
injury  predictors  that  are  highly  correlated  with  the  sled  profile  variables. 

The  extraction  of  such  predictors  from  the  head  dynamic  response  time 
trace  data  can  be  achieved  by  the  method  of  principal  components.  If  there 
are  several  different  kinds  of  time  traces,  it  is  desirable  to  condense  the 
resulting  predictor  variables.  The  statistical  method  of  canonical  correla- 
tion analysis  can  be  used  to  form  an  optimal  condensation  of  these  predictor 
variables  with  respect  to  the  sled  profile  variables.  The  predictor  variables 
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will  be  condensed  in  the  form  of  two  linear  combinations  canonically 
correlated  with  the  sled  profile  variables.  The  statistical  structure 
of  principal  components  analysis  and  canonical  correlation  analysis  is 
described  in  the  following  section. 
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II.  STATISTICAL  FORMULATION 
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Principal  components  are  linear  combinations  of  random  variables 
which  have  special  properties  in  terms  of  the  variance.  The  first 
principal  component  is  the  normalised  linear  combination^  of  variables 
with  maximum  variance.  The  second  principal  component  is  the  normalized 
linear  combination  of  variables  that  have  maximum  variance  among  all 
linear  combinations  uncorrelated  with  the  first  principal  component.  The 
third  principal  component  is  the  linear  combination  that  has  maximum  variance 
among  all  linear  combinations  uncorrelated  with  the  first  principal  and 
second  principal  components,  and  so  forth.  If  there  are  n variables,  it  is 
possible  to  find  n principal  components,  with  each  succeeding  principal 
component  h?  * ng  variance  smaller  than  its  predecessor.  Usually  the  first 
few  principal  components  will  account  for  most  of  the  variability  in  the 
data.  Thus,  these  linear  combinations  usually  contain  most  of  the  information 
in  the  data. 

For  each  kind  of  dynamic  response  time  trace,  there  is  a corresponding 
set  of  principal  components  that  contain  most  of  the  information  in  the  data 
and,  as  such,  define  a set  of  potential  predictors  for  injury  likelihood. 

This  set  of  predictors  can  be  condensed  by  means  of  canonical  correlation 
into  two  predictors  in  a way  that  describes  the  interrelationship  between 
the  sets  of  principal  components  and  the  sled  profile  variables. 

Canonical  correlation  analysis  is  a statistical  methodology  used  to 
express  the  Interrelationships  between  two  sets  of  variables.  In  this 
technical  report,  concern  centers  on  the  set  of  principal  components 
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The  sum  of  squares  of  the  coefficients  equals  one. 
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derived  from  the  head  dynamic  response  time  trace  data  and  the  sled 
profile  variables.  The  first  canonical  correlates  derived  from  canonical 
correlation  analysis  are  the  linear  combinations  of  variables  in  each  set 
that  have  maximum  correlation.  The  second  canonical  correlates  are  the 
linear  combinations  of  variables  in  each  set  that  have  maximum  correlation 
among  those  .linear  combinations  uncorrelated  with  the  first  linear  combina- 
tions. The  number  of  variables  in  the  smaller  of  the  two  sets  is  the 
maximum  number  of  canonical  correlates  that  exist. 

Since  the  sled  profile  variable  set  contains  only  two  variables,  peak 
sled  acceleration  and  rate  of  acceleration  onset,  there  can  only  be  two 
canonical  correlate  pairs.  The  parts  of  the  canonical  correlate  pairs  that 
are  the  linear  combinations  of  the  principal  component  set  are  the  final 
predictors  for  injury  likelihood.  Because  of  the  canonical  correlation 
structure  established  between  the  principal  components  (which  contain  most 
of  the  head  dynamic  response  information)  and  the  sled  profile  variables 
(which  are  excellent  predictors  of  injury  likelihood),  these  final  predictors 
should  be  good  predictors  of  injury  likelihood. 
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A.  DATA  PREPROCESSING 

It  is  assumed  that  each  head  dynamic  response  time  trace  considered 
will,  for  each  subject,  be  sampled  at  each  of  n equally-spaced  time  points. 
The  observation  sampled  at  time  t for  time  trace  i and  subject  j will  be 
denoted  by  z.,  . Figure  1 provides  a diagrammatic  view  of  the  situation. 

In  the  analysis,  the  observations  zij i> • • • » zijT  be  considered  as  com- 

prising a T-dimensional  vector  . 

Before  an  attempt  is  made  to  apply  the  principal  components/canonical 
correlation  procedure,  care  must  be  taken  to  guard  against  unsatisfactory 
results  because  of  the  lack  of  preprocessing.  Regardless  of  which  head 
dynamic  responses  are  selected  for  examination,  peaks  within  corresponding 
time  traces  should  be  aligned,  since  it  is  reasonable  to  assume  that  peaks 
may  be  major  contributors  to  injury.  For  example,  if  peak  z linear  accelera- 
tion were  highly  correlated  with  injury,  its  effect  might  not  be  noted  if 
peaks  were  not  aligned.  In  such  a situation,  the  effect  would  be  damped 


because  of  the  peak  occurring  at  different  locations  in  the  Zyj  vectors. 

Once  peak  alignment  has  been  carried  out,  composite  observational  vectors 
as  shown  in  Figure  2 can  be  formed  by  linking  all  head  response  vectors  with 
the  two  corresponding  sled  profile  variables.  The  values  of  these  sled 


variables  for  subject  j will  be  denoted  by  s.^  (peak  sled  acceleration)  and 
(rate  of  sled  acceleration  onset). 


B.  DATA  ANALYSIS 

For  each  of  the  I dynamic  response  time  traces,  the  corresponding  data 
should  be  used  to  find  the  principal  components.  For  every  type  of  time 
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trace,  enough  principal  components  should  be  computed  to  account  for  most 
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of  the  variability  of  the  data  in  that  time  trace.  Hopefully,  the  first 
few  principal  components  should  be  enough  to  account  for  most  of  the  vari- 
ability of  a particular  time  trace  data  set. 

For  each  subject,  the  corresponding  set  of  principal  components  scores 
of  that  subject's  head  dynamic  response  time  traces  can  be  concatenated 
along  with  the  corresponding  sled  profile  variables  to  form  the  observational 
vectors  in  the  reduced  data  set  as  shown  in  Figure  3.  There,  for  nota- 
tional  simplicity,  M principal  components  are  shown  for  each  kind  of  time 
trace.  If  the  n-unber  of  subjects  (J)  is  greater  than  the  dimension  (IM  + 2) 
of  these  reduced  observational  vectors,  than  a canonical  correlation  analysis 
can  be  performed  on  this  condensed  data  set. 

Assuming  a sufficient  number  of  subjects,  a canonical  correlation 
analysis  may  be  performed  on  the  reduced  observational  vectors  to  obtain  two 
linear  combinations  of  the  principal  components  that  are  correlated  with  the 
two  sled  profile  variable  linear  combinations.  The  first  canonical  correlates 
are  the  linear  combinations  of  the  principal  component  set  and  the  sled 
profile  variable  set  that  have  maximum  correlation  with  each  other.  The 
second  canonical  correlates  are  the  linear  combinations  of  the  principal  com- 
ponent set  and  the  sled  profile  variable  set  that  have  maximum  correlation 
among  all  those  linear  combinations  uncorrelated  with  the  first  canonical 
correlates.  The  resulting  two  linear  combinations  of  principal  components, 
taken  from  the  two  canonical  correlate  pairs,  are  the  injury  likelihood 
predictors . 


III.  COMPUTATIONAL  PROCEDURE 


The  principal  components  and  the  canonical  correlation  analyses  can 
be  conducted  with  the  aid  of  one  of  the  several  sophisticated  computer 
packages  currently  available.  In  this  section,  the  statistical  package 
BMDP  [1]  is  used  for  purposes  of  illustration.  Within  BMDP,  the  program 
BMDP4R  can  be  used  to  perform  the  principal  components  analysis  and  the 
program  BMDP6M  can  be  used  to  perform  the  canonical  correlation  analysis. 
BMDP4R  should  be  used  on  each  time  trace  data  set  to  obtain  an  output  of 
the  corresponding  principal  components.  A subset  of  these  outputs  can 
then  be  properly  arranged  and  input  to  the  BMDP6M  program  to  compute  the 
canonical  correlates. 

The  BMDP4R  program  computes  the  principal  components  and  regresses 
them  on  a user  specified  dependent  variable.  The  regression  of  the  principal 
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entry  of  the  principal  components  into  the  regression  analysis.  Specifying 
CORRELATION  in  the  regression  paragraph  causes  the  principal  components  to  be 
entered  in  the  order  of  magnitude  of  the  absolute  value  of  their  correlations 
with  the  dependent  variable,  the  largest  entered  first.  Specifying  EIGENVALUE 
causes  the  principal  components  to  be  entered  in  the  order  of  magnitude  of 
the  variance  of  the  principal  components.  Regression  on  the  principal  com- 
ponents is  not  an  important  aspect  of  this  analysis,  but  at  this  point  interest 
centers  on  the  magnitude  of  the  variance  of  the  principal  components  and 
specifying  EIGENVALUE  will  cause  the  principal  component  coefficients  to  be 
conveniently  output  in  the  order  of  magnitude  of  the  variance  of  the  principal 
components . 

The  principal  component  coefficients  will  be  denoted  by  c . These  are 

imt 

estimated  parameter  values.  Specifying  SCORE  in  the  BMDP4R  print  paragraph 
causes  the  output  of  the  principal  component  scores,  i.e.  , the  linear  com- 
binations of  time  trace  points  for  each  subject.  The  principal  component 

scores  will  be  denoted  by  w . If  the  first  few  principal  components  account 

ijm 

for  most  of  the  variability  of  a given  time  trace  data  set,  only  the  principal 
component  scores  corresponding  to  these  first  few  principal  components  will 
be  needed  to  form  the  reduced  data  set  as  shown  previously  in  Figure  3.  All 
the  principal  component  scores  can  be  output  to  a BMDP  file.  Note  that  the 
BMDP4R  program  does  not  output  both  of  the  sled  profile  variables  along  with 
the  principal  component  scores.  A BMDP4R  program  must  be  run  for  each  type 
of  head  dynamic  response  time  trace,  and  for  each  BMDP4R  output,  only  the 
principal  component  scores  corresponding  to  the  first  few  principal  components 
need  to  be  used  to  form  the  data  set  shown  in  Figure  3.  The  data  manipulation 
required  to  form  the  data  set  shown  in  that  figure  can  be  done  manually  or 


by  using  BIMEDT,  a Fortran  transformation  program  of  BMDP,  in  conjunction 
with  output  files  created  by  the  BMDP4R  programs. 


The  BMDP6M  program  may  be  used  to  compute  the  canonical  correlate 
pairs  from  the  reduced  data  set  shown  in  Figure  3.  In  the  canonical  para- 
graph the  "FIRST"  set  of  variables  should  be  the  principal  component  scores 
for  each  of  the  time  traces,  taken  from  the  reduced  data  set  in  that  figure 
The  "SECOND"  set  of  variables  should  be  the  sled  profile  variables.  In  the 
BMDP6M  print  paragraph  the  parameters  COEF  and  CANV  should  be  specified. 
COEF  causes  the  output  of  the  coefficients  of  the  canonical  correlates  and 
CANV  causes  the  output  of  the  canonical  correlate  scores,  i.e.,  the  linear 
combinations  of  the  principal  components  scores  for  each  subject  and  the 
linear  combinations  of  the  sled  profile  variables  for  each  subject.  Let 
and  U2j  represent  the  two  linear  combinations  of  principal  component 
scores  for  the  j—  subject.  Let  and  V represent  the  two  linear  com- 
binations of  the  sled  profile  variables  for  the  ji!l  subject.  Then  and 
Ujj  are  the  candidates  for  the  final  predictors  for  injury  likelihood  for 


th 


the  j subject.  (Figure  5 provides  a summary  of  the  overall  procedure.) 
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I is  the  number  of  time  trace  types. 

M is  the  number  of  principal  components  used  for  each  time  trace. 

a.  , b,  are  coefficients  estimated  from  the  data  set  in  Figure  3. 
im*  im 

w is  the  principal  component  score  for  the  j—  subject,  corresponding 
1 JO 

to  the  i—  time  trace. 

j ^ is  the  time  trace  point  from  the  i—  time  trace  for  subject  j, 

c is  a coefficient  estimated  from  the  z data  in  Figure  4. 
imt  ijt 

U Ujj  are  the  final  predictors  of  injury  likelihood  for  subject  j. 


Figure  5:  A Summary  of  the  Overall  Procedure 
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IV.  SUMMARY 


Two  statistical  criteria  are  employed  in  defining  the  independent 
variables  for  the  logistic  prediction  model.  One  is  a high  percentage  of 
time  trace  variability  associated  with  a principal  component;  the  other  is 
good  correlation  of  the  principal  components  with  the  sled  profile  variables. 
The  logistic  injury  model  is  to  be  used,  in  part,  to  study  how  changes  in 
head  dynamic  response  affect  changes  in  injury  likelihood.  The  method  of 
principal  components  is  used  to  arrange  and  condense  the  head  dynamic 
response  time  trace  data  into  a form  that  accounts  for  the  variability  of 
the  data  in  an  optimal  manner.  The  first  few  principal  components  of  any 
one  kind  of  time  trace  should  account  for  most  of  the  head  dynamic  response 
variability  in  that  type  of  time  trace. 

Since  the  sled  profile  variables  were  found  to  be  excellent  predictors 
of  injury  likelihood,  it  is  desirable  to  form  statistics  of  the  principal 
component  data  that  are  highly  correlated  with  the  sled  profile  variables. 
This  can  be  achieved  by  the  method  of  canonical  correlation  analysis.  In 
short,  the  method  of  principal  components  is  used  to  epitomize  the  head 
dynamic  response  time  trace  data  and  the  method  of  canonical  correlations 
is  used  to  form  statistics  of  the  principal  components  that  should  predict 
Injury  likelihood  well. 
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